Model Selection

Multi-GPU Parallel Inference

# Multi-GPU Parallel Inference

Llama 3.1 405B Instruct FP8

The NVIDIA Llama 3.1 405B Instruct FP8 model is a quantized version of Meta's Llama 3.1 405B Instruct model. It uses an optimized Transformer architecture and is an autoregressive language model. This model can be used for commercial or non-commercial purposes.

Large Language Model

Bloom Deepspeed Inference Fp16

BLOOM is an open-source multilingual large language model developed by the BigScience project, designed to provide efficient text generation capabilities.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase